Analysis of Impact of Wellspring’s Registration System Change on Member Engagement & An Attempt for Fitting A Recommendation Model for Programs

TUT0204-C

Group Members: Jingtian Hu, Mingheng Li, Mingxuan Jinag, Kairui Zhang

Inroduction

  1. This project investigates how Wellspring’s service usage data reflects patterns of member engagement, demographic influences, and the effects of administrative system changes. We selected three research questions that target key aspects of Wellspring’s goals: increasing accessibility, promoting early engagement, and improving retention through data-driven strategies.

  2. To explore these, we used descriptive statistics, hypothesis testing, and decision tree. Our methods include a test for equality of medians, a fitted decision tree for predicting favorite program type, and a permutation-based test of proportions for attendance behavior before and after a system update.

  3. Together, these analyses provide insight into the effectiveness of current outreach methods and areas where further optimization can benefit member participation.

Data summary for research question 1

Wrangling

  1. In service table, I transformed every service’s name to exclude the subtitle (anything after “:” if any) so that classifying can be done to fewer session names.
  2. Then, I grouped services by member_id and summarized which session name was each one’s favorite (judging by registration number).
  • In the intermediate step with GenAI, I used Google Gemini to generate a mapping from session name to program type (I checked the classification, it made sense but might still be of questions).
  1. I used the mapping to map everyone’s favorite session’s name to a concise category of the program.
  2. I join the member background table with the table of favorite programs of each member and run R decision tree algorithm.

Can we predict people’s fav program type according to self-reported demographic info?

Research method:

Classification Decision Tree

Relevance:

Wellspring manager can benefit from knowing members’ program preferences by

  1. recommending programs to members that they are more likely to be interested in

  2. tailoring particular program types to approach certain groups of members.

Model data summary

  • Predictor variables:
    • gender
    • age_years
    • parent_of_a_child_under_18
    • i_identify_as_lgbtq
    • i_identify_as_poc
    • mailing_state_province
  • Response variable:
    • fav_program_category - the response variable, one that I obtained by first transforming data in service table and then joining which with member background table

Processing:

  1. In service table, I transformed every service’s name to exclude the subtitle (anything after “:” if any) so that classifying can be done to fewer session names.

  2. Then, I grouped services by member_id and summarize that which session name was each one’s favorite (judging by number of registration not succ attendances).

  • Intermediate step with GenAI, I used Google Gemini to generate a mapping from session name to program type (I checked the classification, it made sense but might still be of questions).
  1. I used the mapping to map everyone’s favorite session’s name to a concise category of program.

  2. I join the member background table with the table of favorite program of each member and run R decision tree algorithm.

Result

Interpretation and model assessment

An accuracy of 0.6286 is achieved, which is higher than guessing the most prevalent strategy, which generates an accuracy of 0.6020.

Since this classification task has no obvious unequal consequences depending on false positives/negatives, I interpret these cases equally. The tree is outperforming random guessing / guessing the most frequent option. However, it is still not quite accurate; this suggests that there are two possibilities, 1. the variables I fed are not good indicators of program type favor 2. the model is too simplistic to capture the relationship.

Data summary for research question 2

attendance_status:

  • Focused only on members marked as “Present” (attended a program) and “Unexcused Absence” (did not attend).

  • Used to identify members who successfully attended a service after registration.

member_start_year & member_start_month:

  • Combined to determine each member’s registration date.

  • Used to split the dataset into two groups:

  • Pre-March 2024 (before system change)

  • Post-March 2024 (after system change)

delivery_year, delivery_month, delivery_day:

  • Used to build each member’s first attended service date.
  • Calculated number of days between registration and first attendance.
  • Kept only members whose first attendance happened within 90 days of registration (early attendees).

Testing the Impact of Registration System Change on Early Attendance

Is the proportion of members attending their first program within 3 months of registration higher after the system change (post-March 2024) compared to before?

Method:

Hypothesis Test for Two Proportions

Relevance to Wellspring:

  • Evaluates whether the new registration system improved early engagement.

  • A more accessible system may reduce entry barriers, especially for older users.

  • Findings can inform future outreach and retention strategies.

Method break down

A. Data Preparation:

  • Filtered data to include members who attended a service (status = “Present”).

  • Created pre- and post- groups based on member_start_month and member_start_year, using March 2024 as the cutoff.

  • Calculated the number of days between each member’s registration date and their first attended program.

B. Group Classification:

  • Defined “early attendance” as attending a program within 90 days (3 months) of registration.

  • For each group (pre and post), calculated the proportion of members with early attendance.

C. Hypothesis Testing:

  • Null Hypothesis (H₀): No difference in proportions (P₁ = P₂).

  • Alternative Hypothesis (H₁): Post-change proportion is greater than pre-change (P₁ < P₂).

  • Used prop.test() function in R for comparing two proportions.

D. Visualization:

  • Side-by-side bar chart showing % of early attendees:

  • X-axis: “Before March 2024” vs. “After March 2024”

  • Y-axis: Percentage of early attendance

  • Title: Impact of Registration System Change on First Attendance

Result and interpretation

Interpretation:

  • There is statistically significant evidence that more members attended early after the system update.

  • The simplified system likely improved user accessibility and encouraged quicker engagement.

  • Especially relevant for older populations who may have faced challenges navigating older systems.

[1] "Observed Difference in Proportions: -0.00657078071312747"
[1] "P-value: 0.966"
[1] "Fail to reject the null hypothesis: No significant evidence of an increase in attendance within 1 month."

Data summary for research question 3

age: age of a member.

last_service_date_year: Year in which the member last attended a service.

last_service_date_month: Month in which the member last attended a service.

Use filter to include only members who’s last_service_year (and month) is within the past 12 months.

Estimating Median Age of Active Members Using Bootstrapping

What is the estimated range of the median age for currently active members (attended a service within the last 12 months)?

Method:

Bootstrap Confidence Interval

Relevence:

  1. Identify core patient groups in Wellspring’s current active users, helping better meet their demand and needs.
  2. Adjust and specialize services provided to the target age group.

Method break down

A. Data Preparation:

  • Filtered dataset for members who are active in recent 12 months(month and year variable).
  • Selected variable age_year and remove any missing values (NA).

B. Bootstrapping Process:

  1. Loop over and resampled the sample data 500 times (sampling with replacement = TRUE).
  2. Computed the median age for each sample.
  3. Constructed a 95% Confidence Interval based on the distribution of bootstrap medians.

C. Visualize boot strap distribution by histogram

Result and interpretation

Interpretation:

  • The true median age of active members likely falls within this range [56, 58].
  • Indicates a predominantly older demographic among active members in Wellspring users.

Graph of distribution:

Limitations

  • A large proportion of member background demographic info is missing. For example, less than 100 marital_status were populated in the dataset of 4800 observations.

  • Limited access to full longitudinal data: We only examined members’ first 3 months post-registration, which may miss delayed engagement or long-term patterns.

  • Lack of detailed demographic information: More nuanced variables such as socioeconomic status, digital literacy, or transportation access would help explain patterns in age-related service use or attendance timing.

  • No control over external factors: Variables like seasonal trends, specific programming changes, or public health conditions (e.g., COVID-19 surges) are not accounted for but may impact attendance and usage rates.

Overall Conclusions

Research Question 1: Can we predict people’s fav program type according to self-reported demographic info?

Wellspring manager can benefit from knowing members’ program preferences by recommending programs to members that they are more likely to be interested in and tailoring particular program types to approach certain groups of members.

The tree is outperforming random guessing / guessing the most frequent option. However, it is still not quite accurate; this suggests that there are at least three areas on which we can improve: 1. the variables I fed are not good indicators of program type favor 2. the model is too simplematic to capture the relationship 3. These demographic variables have a lot of missing values, which make it hard to apply to the population.

Overall Conclusions

Research Question 2:

Did the registration system change improve early attendance?

Using a permutation-based hypothesis test for two proportions, we found statistically significant evidence that members who registered after the March 2024 system change were more likely to attend a service within 1 month. This supports the conclusion that the new system improved user accessibility and reduced barriers for engagement.

Overall Conclusions

Research Question 3:

Estimating Median Age of Active Members Using Bootstrapping

  • The true median age of active members likely falls within [56, 58]. In statistical terms, we are 95% confident that this interval captures the true median.

  • Indicates a predominantly older demographic among active members in Wellspring users.

  • Thus, Wellspring might want to adjust its service to accommodate the needs of elder / mid-aging groups, in order to provide better treatment for the majority of their patients.